Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 1 de 1
Filter
Add filters

Database
Language
Document Type
Year range
1.
7th IEEE International Conference on Information Technology and Digital Applications, ICITDA 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2191874

ABSTRACT

In 2020, Most Filipinos are using the internet due to COVID-19 pandemic lockdowns. The internet is not limited to adults and children might be exposed to online adult content and abuse. The Philippine Internet Service providers fail to capture pornographic web pages that are not for child viewing. A Web Page classifier would help in detecting and classifying web pages. In this study, a total of 12000 web pages with adult content and academic web pages were collected using scrapy and existing datasets from DMOZ were used to create a Support Vector Machine (SVM) multi-class classifier. To improve the accuracy of the SVM model, data preprocessing was performed to remove noisy and irrelevant data from the dataset. The text in the web pages was used to train the SVM classifier by using Term Frequency and Inverse Document Frequency, Count vectorizer, and Word2vec Skip-gram embedding with TF-IDF as a multiplier. A series of experiments were conducted using multiple word embedding techniques. The SVM model built using word2vec with TF-IDF multiplier outperforms the SVM model built using TF-IDF and Count Vectorizer. The word embedding generated using word2vec was generated with a window size of 9 and a vector dimension of 900. The SVM model built using word2vec shows an S6% accuracy. The SMV model is deployed in the Django framework and a chrome plugin was created to use the SVM model using REST API. © 2022 IEEE.

SELECTION OF CITATIONS
SEARCH DETAIL